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INTRODUCTION 


Recent  advances  in  radar  imaging  systems  have  been  cause  for  a  renewed  interest  in  appropriate 
image  reconstruction  algorithms.  This  problem  is  somewhat  different  from  the  more  usual  (optical) 
imaging  situation  because  the  radar  data  to  be  analyzed  often  consist  of  the  in-phase  and  quadrature 
components  of  the  scattered  electromagnetic  field,  and  so  these  data  are  complex-valued.  Moreover, 
it  is  clear  from  the  coherent  nature  of  radar  that  the  full  amplitude  and  phase  properties  of  the 
reconstruction  problem  must  be  addressed  if  the  details  of  the  radar /target  interaction  are  to  be 
deciphered. 

Radar  imaging  shares  with  optical  image  reconstruction  the  two  central  problems  of  resolution 
degradation  and  sensitivity  to  noise  (see,  for  example,  References  1  through  3  and  references  cited 
therein).  The  resolution  problem  occurs  when  the  data  axe  limited,  and  overcoming  this  problem 
requires  additional  “data”  be  added  in  a  way  that  either  fills-in  the  unmeasured  information  or,  at 
least,  precludes  unwanted  artifacts.  Maximum  entropy  (or  minimum  cross-entropy)  methods  deal 
neatly  with  these  information-oriented  issues  and  enjoy  considerable  popularity  (References  4  and  5). 
Unfortunately,  while  there  has  been  some  effort  to  deal  with  noise  or  complex-valued  data  (Reference 
6),  traditional  entropy-based  approaches  are  often  inadequate. 

Noise  moderation  is  generally  accomplished  by  more  general  Bayesian  techniques.  Here,  the 
noise  effects  and  the  additional  (unmeasured)  data  are  separately  modeled  and  accounted  for.  In 
addition,  because  of  the  generality  of  Bayesian  methods,  complex-valued  data  presents  no  funda¬ 
mental  complications  and  are  often  employed. 

We  will  examine  the  issues  of  noise,  information  entropy,  and  complex-valued  data  within  the 
context  of  Bayesian  image  reconstruction.  We  will  do  this  by  considering  the  problem  of  regular¬ 
ization  of  ill-posed  linear  operator  problems  using  Bayes  risk  functionals.  After  reviewing  the  basic 
concepts  and  establishing  notation  we  will  introduce  a  Markov  random  field  (MRP)  image  model 
that  will  allow  us  to  account  for  the  possibility  of  extended  image  elements.  Our  basic  result  is  a 
regularization  functional  that  generalizes  traditional  maximum  entropy  approaches  to  the  case  of 
noisy  complex-valued  data.  A  simple  image  reconstruction  algorithm,  which  applies  these  results 
to  the  MRP  model,  is  presented  as  well  as  an  example  relevant  to  inverse  synthetic  aperture  array 
(ISAR)  imaging. 
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ILL-POSED  PROBLEMS 


For  many  ^sterns,  a  measurement  obtained  at  position  Xj  can  be  modeled  as  a  linear  operation 
by  a  kernel  AT  on  an  object  function  /: 


F(xi)=  [  K{x.i,y)f{y)dy. 
Jd 


(1) 


The  goal  in  object  reconstruction  is  to  estimate  /(y)  from  measured  data  {i^(xi)}i=i^...^iViAr2  when 
K  is  known.  This  is  an  inverse  problem  and  considerable  effort  has  been  expended  in  developing 
reliable  and  efficient  algorithms  for  its  solution  in  a  wide  variety  of  applications. 

When  the  measurements  {F(xi)}  can  be  made  with  perfect  accuracy,  then  the  inversion  problem 
is  complicated  only  by  the  fact  that  the  measurements  are  necessarily  limited  in  extent  and/or  finite 
in  number.  If  f  is  considered  to  belong  to  a  Hilbert  space,  then  Equation  1  can  be  written  as 
F  =  Kf ,  where  K  is  the  linear  operator  corresponding  to  K  that  maps  f  to  the  measurements  F. 

Unfortunately,  most  measurements  will  be  contaminated  by  noise  so  that  Equation  1  becomes 


F  =  Kf  +  n 


(2) 


where  n  represents  a  random  noise  component.  When  the  problem  is  ill-posed,  the  inverse  of  K 
may  map  small  variations  in  the  data  to  large  (and  unwarranted)  variations  in  f.  In  general,  ill- 
posedness  will  occur  whenever  K  is  compact — and  in  particular,  whenever  K  is  square-integrable 
and  nondegenerate.  Consequently,  the  inverse  problem  associated  with  Equation  1  is  often  ill-posed 
and  the  effects  of  this  noise  contamination  can  be  significant. 

A  traditional  approach  for  obtaining  an  estimate  f  of  /  is  to  choose  the  g,  from  the  set  ^  of  all 
possible  solutions,  that  minimizes  the  noise  error  “energy”: 


||F-Kff  =  inin||F-Kg||2. 


(3) 


This  method  makes  sense  even  when  K  does  not  have  a  proper  inverse  (for  example,  when  K  is  not 
square).  Moreover,  the  solution  process  implied  by  Equation  3  can  frequently  be  implemented  in  a 
straightforward  and  computationally  efficient  manner.  The  generalized  inverse  implied  by  this  “least- 
squares”  solution,  however,  is  unbounded  (unless  K  has  finite  rank)  and  so  the  effects  of  the  noise 
component  in  the  data  are  generally  unmoderated  by  ordinary  least-squares  analysis.  Successfully 
addressing  the  issue  of  noise  in  ill-posed  inverse  problems  requires  a  detailed  imderstanding  of  both 
the  noise  process  and  the  set  ^  of  acceptable  solutions.  It  is  known,  for  example,  that  by  sufficiently 
restricting  $,  we  may  be  able  to  “regularize”  the  inversion  process  and  control  the  effects  of  noise. 
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REGULARIZATION 


Because  n  is  random,  the  measured  data  will  be  a  realization  selected  from  some  infinitely  large 
ensemble  and  we  can  only  obtain  the  estimate  f  in  a  statistical  sense.  Consequently,  our  knowledge 
of  f  will  come  from  a  probability  density  p{f).  In  fact,  the  density  p(f)  is  the  estimated  “solution” 
to  the  problem  because  it  describes  what  can  be  known  about  f,  and  once  p(f)  is  determined  an 
estimate  f  can  be  obtained  in  a  variety  of  ways. 

Moderating  the  effects  of  noise  is  not  the  only  problem  associated  with  object  reconstruction: 
often  we  must  also  deal  with  the  limitations  imposed  by  incomplete  measurements.  To  see  this, 
consider  Equation  2  and  let  ki  denote  the  i*^  row  of  K.  Since  the  data  obtained  by  Equation  1  are 
necessarily  discrete  and  finite,  only  those  components  of  f  that  lie  in  the  subspace  spanned  by  the  set 
of  all  ki  contribute  to  the  measurements.  This  subspace  is  the  measurement  space  and  components 
of  f  that  lie  in  the  subspace  orthogonal  to  the  measurement  space  (the  null  space)  will  not  affect 
the  measurement.  Estimates  of  f  of  the  form  of  Equation  3  may  display  unwanted  artifacts  because 
they  make  no  effort  to  account  for  the  null  space  components  of  f. 

A  statistical  estimation  method,  which  generalizes  the  idea  behind  Equation  3,  sets  the  estimate 
f  of  f  to  minimize  the  Bayes  risk: 


B{f)  =  I  C'(f,f)pf|F(f|F)df, 


(4) 


where  ^(f ,  f )  is  the  cost  function  representing  some  difference  error  between  f  and  f ,  and  df  denotes 
an  appropriate  measure  on  the  space  of  all  f.  The  function  pf|F(f|F)  is  the  posterior  distribution  of 
f  given  the  known  values  F: 


Pf|F(f|F)  = 


PF|f(F|f)pf(f) 

Pf(F) 


(5) 


The  Bayes  risk  is  the  expected  cost  of  an  error  in  estimation  and  Bayesian  estimation  seeks  to 
minimize  this  expected  cost. 

Several  cost  functions  have  become  popular,  =  ||f  —  f|P  is  the  squared  error  and  yields 

the  minimum  variance  estimate: 


=  ^  /  l|f  -  Pf|F(f|F)  df  =  2  J{f-{)  Pf|F(f|F)  df  =  0 

=>  fMV=y"fpf|F(f|F)df. 


(6) 
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Another  frequently  employed  cost  function  sets 


C'ucF(f,f)  = 


0  ifiif-fii<i 
1  ifllf-fll>§. 


(7) 


This  is  the  uniform  cost  function  and,  in  the  limit  e  0,  leads  to  the  maximum  a  posteriori  estimate: 


6-0  9f 


=  0 


fMAP  =  arg  max  pf|F(f|F), 


(8) 


When  the  cost  function  is  symmetric  and  convex  about  its  minimum,  and  Pf|F(f  |F)  is  symmetric 
about  fMV)  then  it  can  be  shown  that  the  optimal  Bayes  estimate  is  equal  to  the  minimum  variance 
estimate  (Reference  7).  Consequently,  the  minimum  variance  estimate  is  the  optimal  estimator  for 
a  wide  variety  of  very  reasonable  circumstances. 

Of  course,  this  kind  of  estimation  methodology  presumes  that  the  specification  of  pf[F(f|F) 
in  Equation  5  is  sufficiently  nontrivial  that  the  application  of  Equation  6  or  8  is  relevant  to  the 
problem  of  moderating  the  effects  of  limited  and  corrupted  data.  This  can  only  be  accomplished 
when  the  prior  density  Pf  (f)  includes  genuine  information  about  f  that  is  not  already  contained  in 
the  measured  data.  Moreover,  the  estimates  of  Equations  6  and  8  reflect  only  limited  properties 
of  the  “complete  solution”  Pf|F(f|F):  fMV  is  the  first  moment  of  the  density  PfjF  while  fMAP  is  its 
maximum.  A  real  practical  difficulty  in  obtaining  any  estimate  f  lies  in  first  obtaining  a  meaningful 
estimate  ofpf[F(f|F). 

The  determination  of  pf|F(f|F)  requires  the  specification  of  the  factors  PF|f(F|f)  and  Pf(f). 
(The  denominator  Pf(F)  can  be  considered  to  be  a  normalizing  constant.)  The  first  of  these  factors 
is  the  a  priori  probability  density  of  the  measurement  given  the  object  f  and  is  usually  assumed  to 
be  determined  by  the  noise  statistics  Pn(n)  of  the  measurement  process.  Because  of  Equation  2,  we 
can  set 


PF|f(F|f)=p„(F-Kf) 


(9) 


when  Pn(n)  is  known.  The  second  term,  Pf  (f),  is  the  a  priori  probability  density  of  f.  This  is  the 
term  that  must  be  specified  in  a  non-trivial  manner  if  the  artifacts  caused  by  insufficient  data  axe  to 
be  ameliorated.  Unfortunately,  specifying  Pf  (f)  in  such  a  non-trivial  way  requires  that  considerable 
information  about  f  be  known  in  advance  of  the  solution  process  and  this  is  often  not  possible.  A 
more  common  approach  models  Pf  (f)  to  include  only  those  properties  known  to  be  true  for  a  large 
and  general  class  of  f.  This  generality,  however,  often  implies  considerable  ambiguity  in  actually 
specifying  a  pf  (f )  to  be  used  in  a  particular  situation.  Determining  the  “best”  pf  (f )  for  the  estimation 
problem  at  hand  is  often  a  significant  additional  problem  in  itself. 
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One  approach  to  estimation  of  p(f)  begins  by  extending  the  allowed  class  of  cost  functions 
C  — >  (5  to  include  differences  in  the  “estimation  error”  of  density  functions.  A  common  example  of 
this  is  the  cost  function 


C(p{t),p{f))  =  Inp(f)  -  Inp(f). 


(10) 


(The  quantity  -lnp(f)  is  known  as  the  “self  information”  of  p(f)  and  weights  the  information 
“importance”  of  an  event  by  its  frequency — ^the  less  frequent  event  being  the  more  important.) 

When  p  and  p  both  have  exponential  arguments  in  f ,  then  Equation  10  can  be  easily  compared 
to  Equation  4.  The  definition  of  Equation  10  is  more  general,  however,  and  the  negative  of 

B(p,p)  =  (|ffl)  p(f)  df  (11) 


is  the  so-called  cross-entropy^  or  Kullback  “distance”  between  p  and  p.  B{p,p)  measures  the  (non- 
symmetric)  information  difference  between  the  two  distributions  and  provides  a  mechanism  for 
estimating  our  model  error:  of  all  the  possible  p  that  axe  consistent  with  the  prior  information, 
choose  the  one  that  minimizes  the  information  difference  to  p. 

As  with  noise-corrupted  measurements,  the  deleterious  effects  caused  by  restricted  measure¬ 
ments  can  be  moderated  by  including  an  accurate  prior  model.  When  such  additional  information 
about  f  is  known  in  advance  of  the  measurements  (for  example,  that  an  acceptable  /  must  be 
“smooth”  or  that  f  has  a  known  support)  then  the  quality  of  the  estimated  solution  may  often  be 
dramatically  improved  by  including  it.  This  is  the  idea  behind  regularization:  rather  than  force  a 
solution  to  the  original  ill-posed  problem,  select  instead  the  solution  to  a  corresponding  well-posed 
problem  that  is  “close”  to  the  original  problem.  The  framework  for  this  approach  was  established 
in  Reference  2  and  usually  proceeds  by  seeking  estimates  that  are  equally  close  to  conflicting  cri¬ 
teria.  For  example,  if  fo  is  a  prior  estimate  of  object  parameters  incorporating  “general”  known 
properties  of  f  (including  its  null  space  components),  then  our  estimate  should  weigh  and  balance 
this  information  against  the  estimate  that  best  fits  the  measured  data. 

In  terms  of  our  original  problem  of  Equation  2,  let  Bi  (f ,  F)  denote  a  measure  of  the  “difference” 
between  f  and  the  solution  implied  by  the  measured  data.  Similarly,  let  S2(f,fo)  measure  the 
difference  between  f  and  the  a  priori  information  (expressed  symbolically  by  the  parameter  set  fo). 
Regularization  methods  seek  an  estimate  f  that  minimizes  the  functional 


^(f)  =  Bi(f;F)+AB2(f;fo),  f 


(12) 


(Here  A  >  0  is  the  regularization  parameter  and  controls  the  relative  influence  of  each  term.) 

A  recognized  difficulty  with  applying  Equation  12  is  in  the  choice  of  appropriate  and  related 
difference  functionals.  This  is  because  J5i(f,  F)  relates  an  estimate  f  to  the  data  F  whereas,  based 
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on  our  discussion,  52(f,fo)  must  often  relate  an  estimate  of  Pf|F(f|F)  to  p(f) — and  these  are  fun¬ 
damentally  different  quantities.  One  approach  to  this  problem  is  to  model  p(fi)  oc  and  use  the 
atomic  measure  over  D  so  that  integrals  become  sums.  This  is  the  so-called  “configuration  density.” 
Applied  to  Equation  11,  for  example,  this  choice  of  density  results  in  the  configuration  entropy 
difference. 

The  configuration  entropy  has  been  successfully  used  in  a  variety  of  problems  and  its  appropri¬ 
ateness  to  these  problems  has  been  examined  and  debated.  Independent  of  these  concerns,  however, 
is  the  question  of  how  to  generalize  the  method  to  include  an  f  with  non-positive,  or  even  complex¬ 
valued,  components.  Various  schemes  have  tried  p(fi)  a  |fii  and  p(fi)  oc  Re  ft,  but  these  are  generally 
ad  hoc  models  and  do  not  enjoy  a  universal  appeal.  A  better  approach,  perhaps,  is  to  use  a  Bi  (f ,  F) 
which  is  more  compliant  with  the  variation  of  pfjF(f|F). 

One  way  of  doing  this  is  to  link  a  specific  estimate  to  the  data  up-front.  For  example, 

Biipf)  =  /  l|F  -  Kff  Pf|F(f|F)  df.  (13) 


If  we  use  the  cross-entropy  functional  (Equation  11)  for  B2(f,fo),  then  Equation  12  becomes 


S(pf)  =  /  l|F  -  Kf  ||2  pf|F(f IF)  df  +  xjhi  Pf|F(f|F)  d{ 


(14) 


It  is  known  that  a  selection  rule  that  is  both  “regular”  and  “local”  must  be  of  the  form  of  a 
functional  that  minimizes  the  distance  between  the  estimated  solution  and  some  prior  guess  (Refer¬ 
ence  8).  Using  “composition  consistency”  arguments,  it  is  shown  in  Reference  8  that  selection  rules 
for  general  f  (i.e.,  not  having  strictly  positive  components)  must  be  of  the  least-squares  type.  It 
is  also  shown  (References  8-10)  that  composition  consistent  selection  rules  for  p(f)  (with  strictly 
positive  components)  must  be  of  the  cross-entropy  type.  Consequently,  our  choice  of  regularization 
functional  (Equation  14)  is  regular,  local,  and  composition  consistent.  Moreover,  given  that  our 
“distances”  are  formed  by  comparing  Kf  to  F  and  by  comparing  pp^f  to  pf  (and  while  enforcing  a 
connection  between  pp^f  and  f),  we  believe  that  the  choice  of  Equation  14  is  a  natural  one,  and  we 
shall  use  it  for  the  remainder  of  this  discussion. 


OBJECT  FUNCTION  MODELS 


Because  of  Equation  2,  the  statistics  of  n  will  determine  ppif  •  However,  the  choice  for  pf  should 
be  based  upon  a  general  understanding  of  the  possible  behavior  of  the  object  independent  of  the 
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measurement  process  (References  11-17).  If  pf  is  modeled  as  an  M-dimensional  Gaussian  with 
fo  =  {Rf,f}  so  that 


Pf(f;fo)  = 


1 

(27r)^/2VditW 


(15) 


((t)  denotes  complex-conjugate  transpose),  then  it  is  essential  that  Rf~^  be  chosen  to  represent  the 
known  structure  in  the  object  function:  the  choice  Rf  =  I/cr^  is  equivalent  to  saying  that  nothing 
is  known  about  the  structural  possibilities  of  f,  except  the  initial  guess  f . 

In  order  to  model  extended  object  structure  (for  example,  discontinuities)  while  still  account¬ 
ing  for  noise,  we  need  to  be  able  to  distinguish  between  random  variations  attributable  to  signal 
contamination  and  “jumps”  associated  with  the  behavior  of  /  over  a  domain  or  “neighborhood.” 
Specifically,  a  neighborhood  5u  €  D  of  u  is  a  set  of  points  with  the  property  that  Vu,  v  e  D  u  ^  9u 
andv€^  <!=^>  u€0v.  If  p(f)  has  the  characteristic  that 


Vu€D  p(/(u)|/(v),v  ^  u)  =p(/(u)|/(v),v  €  9u),  (16) 


then  /  is  said  to  be  an  MRF.  MRFs  are  useful  as  object  models  because  they  restrict  computation 
to  be  local  and  are  general  enough  to  model  the  behavior  of  /(u)  in  the  neighborhood  of  u.  The 
MRF  property  applied  to  Elquation  15,  for  example,  implies  that  Rf  is  a  sparse  matrix  with  zero 
values  at  locations  that  do  not  correspond  to  neighboring  object  elements. 

In  the  case  of  Gaussian  noise  the  Gaussian  object  model  is  often  chosen  because  of  its  analytical 
advantage  and  the  Gaussian  Markov  random  field  (GMRF)  seems  to  offer  the  potential  for  including 
non-trivial  extended  prior  object  structure.  When  the  object  is  a  priori  known  to  have  discontinu¬ 
ities,  however,  the  Gaussian  model  may  be  inappropriate  because  the  squared  difference  ||f  -  f  ||2 
applies  too  high  a  penalty  to  these  discontinuities  (which  will  therefore  be  excessively  smoothed). 
This  drawback  is  not  endemic  to  MRFs  in  general  and  there  are  many  candidate  prior  object  models 
that  overcome  the  GMRF  tendency  to  “over  smooth,”  but  that  retain  the  more  attractive  features 
induced  by  concentrating  on  local  object  neighborhoods. 

The  GMRF  is  only  one  example  of  a  much  larger  class  of  MRFs  known  as  Gibbs  distributions, 
which  are  defined  over  neighborhoods  in  D.  A  local  set  of  points  c  c  D  is  known  as  a  clique  if 
Vu,  V  e  c,  u  and  V  are  neighbors.  In  general,  a  Gibbs  distribution  can  be  expressed  in  the  form 


Pg{f)  =  ^ex.p(-  Fe(f)  d(^  ,  (17) 


where  Vc  is  a,  “clique  potential”  function  on  D  with  the  property  that  Vdf)  depends  only  on  the 
coordinates  u  of  /  which  lie  in  c.  The  integral  in  Equation  17  is  over  the  set  C  of  all  cliques  in  D, 
and  Z  =  fpg{f)  df  is  a  normalization  factor  known  as  the  “partition  function.”  It  is  known  that  all 
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MRF’s  defined  over  neighborhood  systems  are  Gibbs  distributions  and  can,  therefore,  be  expressed 
in  terms  of  potential  functions. 

Denote  by  a  coeflScient  vector  for  the  clique,  (For  example,  the  exponential  argument  in 
Equation  15  could  be  written  using  Rf“^  =  dcdj.)  We  can  apply  a  more  general  MRF  model  that 
allows  us  to  rectify  some  of  the  limitations  of  the  Gaussian  prior  by  using  the  potential 


ye(f)  =  p(dtf). 


(18) 


The  function  p  is  monotone  increasing  but  not  necessarily  convex.  By  relaxing  the  squared  differ¬ 
ence  penalty  we  can  reduce  the  too-high  penalty  applied  to  discontinuities.  An  example  p  sets  p(x) 
=  min{|xl,T}^  where  T  is  a  user-defined  threshold.  This  choice  has  both  computational  and  the¬ 
oretical  disadvantages;  the  nonconvexity  of  the  model  makes  it  difficult  to  globally  minimize  and 
the  cost  function  is  constant  after  the  threshold  is  surpassed.  This  latter  feature  means  that  edges 
whose  magnitude  is  larger  than  the  threshold  will  be  sharply  reconstituted  in  the  reconstruction 
while  edges  smaller  than  T  will  be  smoothed. 

The  Huber  function, 

PW -\T^  +  2T\x-T\  i£\x\>T,  ^  ’ 


is  convex  and  so  avoids  the  problems  associated  with  nonconvex  p.  Since  the  penalty  for  jx]  >  T  is 
linear,  the  excessive  smoothing  problems  associated  -with  quadratic  penalties  will  also  be  reduced. 

An  (arguably)  more  useful  MRF  model  applies  variable  weights  to  the  quadratic  penalty: 
p(x)  =wlx‘^. 


Tikhonov’s  original  suggestion. 


(21) 


is  an  example  of  this  and  enforces  a  strict  smoothing  constraint  as  the  prior  information.  At  a  true 
image  discontinuity,  the  weight  vP{y)  can  be  set  to  zero  so  as  not  to  preclude  the  feature  firom  the 
ensemble  of  allowed  images.  Moreover,  the  weights  can  be  set  to  reflect  the  reliability  to  which  this 
prior  information  is  believed  to  hold. 
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This  variable  weight  method  generally  results  in  a  highly  nonlinear  (and  possibly  non-convex) 
reconstruction  algorithm  that  often  complicates  numerical  implementation.  In  addition,  there  are 
obvious  problems  with  setting  the  magnitudes  of  the  (location  dependent)  weights:  the  locations  of 
the  image  discontinuities  are  usually  part  of  the  sought-for  image  information.  The  variable  weight 
method  is  appealing,  however,  because  it  can  be  considered  to  be  a  more  general  version  of  Equation 
18  than  functions  of  the  form  of  Equation  19.  Moreover,  the  variable  weight  concept  can  be  readily 
extended  to  ‘fturn  on/off”  types  of  “constraint”  criteria  other  than  simple  smoothing. 

Let  w  denote  the  weight  vector  of  Equation  20  so  that  Wi  =  Choose 


I  Vc{{)  dc  =  f^dcWdlf,  (22) 

cGC 


where  W  is  the  weight  matrix  whose  main  diagonal  is  determined  by  w  (i.e.,  W  =  diag{w}). 

Substituting  Equations  5  and  9  into  Equation  14,  with  pf(f;fQ)  defined  by  Equations  17  and 
22,  we  have  a  representation  of  B{pf)  in  terms  of  the  parameters  w: 


B{W)  =1|F  -  Kfuvf  +  tr[K+K(X  +  R)-i] 

+  ^  (fMvXfMV  -  In  det[X(X  +  R)"^]  -  tr[R(X  +  R)"^])  +  constant , 


(23) 


where  X  =  2dcWdt ,  R  =  KtR^^K,  and  W  =  (X  +  R)-iKtR^iF. 

We  are  interested  in  minimizing  B(W)  over  the  positive  parameters  w.  In  general,  these 
parameters  may  be  chosen  to  be  any  value  greater  than  zero  and  a  general  minimization  algorithm 
can  be  developed  in  terms  of  gradients  of  B{W)  when  w  is  allowed  to  vary  in  this  “soft”  way. 
Alternately,  a  “hard”  choice  could  be  made  so  that 

_  /  ^little  5  if  the  feature  is  to  be  left  alone;  .  . 

^  u;big,  if  the  feature  is  to  be  readjusted.  ^  ^ 


Such  hard  choices  were  employed  by  Reference  12  and  effectively  result  in  discontinuity  detection. 
Various  methods  emplo3dng  soft  choices  have  also  been  proposed  and  yield  somewhat  more  stable 
and  robust  techniques  (c.f.,  References  14  and  15). 

For  illustration  purposes  in  the  present  discussion,  we  shall  employ  the  hard  weights  of  Equation 
24.  The  algorithm  we  use  for  choosing  w  is  then  very  simple: 

step  0:  set  =  t^big^Vj  and  calculate  Bq  =  5(W);  set  i  =  1. 
step  1:  set  Wi  =  lyuttie  and  calculate  B  =  B(W). 
step  2:  if  B  <  jBo  then  go  to  step  4. 
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step  3:  set  wi  =  Whig 

step  4:  set  i  — >  i  +  1;  if  i  >  M1M2  =  dim  W  then  stop,  otherwise  go  to  step  1. 

Once  X  =  2dcWdl  is  determined,  then  the  estimate  fMv  is  set. 

Figure  1  demonstrates  the  method  in  a  one-dimensional  setting.  The  data  consist  of  25  uniformly- 
distributed  points  that  sample  a  simple  function  consisting  of  exactly  two  “steps.”  These  data  were 
contaminated  with  additive  Gaussian  white  noise  (cr  =  5%  of  max(F))  and  are  plotted  in  the  figure. 
Also  plotted  in  the  figure  are  two  100-point  (and  so,  “expanded”)  reconstructions  developed  from 
the  data:  the  first  estimate  is  the  varying  weight  algorithm  discussed  above;  and  the  second  is 
the  “ordinary”  smoothed  reconstruction  with  constant  weights.  The  clique  coefficient  vector  was 
chosen  to  be  a  second-order  measure  of  smoothness  with  dc  the  coefficients  appropriate  to  a  finite 
difference  approximation  to  the  second-order  derivative:  d[f  =  /(yj+i)  --  2/(yi)  +  /(yi_i).  In  the 
figure.  Whig  —  100,  lyuttie  =  lO'^^®,  and  A  =  10^^. 


FIGURE  1.  4x  Expansion  Reconstruction  of  Simple  Two-Step  Function  With  a  =  5%  of 
max(F),  A  =  10”^,  u;big  =  100,  and  tunttie  =  10”^®. 


The  example  of  Figure  1  is  important  in  that  it  clearly  demonstrates  the  difference  between  the 
current  method  and  ordinary  polynomial  “fitting”  techniques.  The  abrupt  “jumps”  in  the  recon¬ 
structed  image — which  correctly  correspond  to  the  associated  jumps  in  the  data — are  not  modeled 
by  polynomial  interpolates.  Moreover,  while  the  noise  mitigation  that  occurs  between  the  jumps  is 
consistent  with  traditional  methods,  the  ability  to  identify  the  jumps  in  the  presence  of  noise  is  the 
focus  of  the  current  discussion. 
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ISAR  IMAGING 


ISAR  is  a  method  for  increasing  the  resolution  of  a  radar  system  by  (eifectively)  increasing  the 
size  of  the  systems’s  aperture.  When  a  radar  target  rotates  in  a  known  manner,  then  data  collected 
as  a  function  of  time  will  also  be  data  collected  across  a  synthetic  aperture.  Careful  assignment  of 
these  data  to  positions  on  this  synthetic  array  often  allow  final  image  resolution  that  is  many  times 
that  implied  by  the  dimensions  of  the  simple  radar  antenna. 

In  a  monostatic  scattering  situation  (in  which  the  transmitter  and  receiver  are  co-located  at 
range  R  and  time  t)  it  can  be  shown  (Reference  18)  that  the  weak  scatterer  approximation  far-field 
response  H  (k)  due  to  a  harmonic  excitation  of  a  target  can  be  written: 

H  {k,  e-,  R,  t)  « - 2 — ^ -  /  e^2ky  g-i2fcx  0  ^y,  (25) 

(27r)  R  Jm? 


This  is  a  linear  superposition  of  waves  radiating  from  a  location  on  a  plane  whose  normal  is  the 
(instantaneous)  axis  of  rotation.  The  angle  6  denotes  the  target  aspect  in  the  plane  and  we  have 
employed  the  small  angle  approximation.  Here,  h  (r)  is  the  local  scatterer  density  function  (scaled 
by  range),  uj  is  the  angular  frequency  of  the  incident  wave  with:  strength  Ho]  wave  vector  k  and 
speed  c.  The  parameters  u)  and  c  are  related  to  k  by  |k|  —  cufc,  and  the  factor  of  2  accounts  for  the 
two-way  travel  distance  from  the  radar  to  the  target  and  back  again. 

Equation  25  is  a  two-dimensional  Fourier  transform  that  can  be  readily  implemented  in  digital 
systems.  (Note,  however,  that  the  effect  of  approximating  the  polar  data  grid  by  the  rectangular 
grid  implied  in  Equation  25  is  the  introduction  of  image  artifacts  when  x  or  y  are  large.  We  will 
not  consider  these  artifacts  further.)  The  ISAR  Identity  25  neatly  splits  the  target  into  down-range 
and  cross-range  factors.  The  quality  of  the  target  reconstruction  will  be  determined  by  the  size  of 
the  target  resolution  cell  whose  down-range  and  cross-range  dimensions  are  inversely  proportional 
to  the  bandwidth  and  angular  aperture,  respectively. 

There  are  several  practical  difficulties  in  applying  Equation  25  to  the  problem  of  estimating  /i, 
and  these  are  discussed  in  the  literature  (References  18  and  19).  The  development  of  the  Fourier 
transform  relationship  between  h{x,y)  and  H{k^6)  has  relied  on  the  approximation  that  h  is  in¬ 
dependent  of  k  and  6.  This,  of  course,  can  only  be  completely  true  if  the  target  consists  of  non¬ 
interacting  and  non-shadowing  point  scatterers — but  the  approximation  is  often  justified  by  arguing 
that  the  bandwidth  is  sufficiently  small  and  the  angular  aperture  is  sufficiently  limited,  so  that  h 
is  “effectively”  independent  of  k  and  9  for  the  duration  of  the  measurement.  And,  in  practice,  this 
approximation  is  often  sufficient  for  a  good  portion  of  the  scattering  elements  (so-called  “scattering 
centers”)  in  a  typical  ISAR  image.  The  scattering  centers  that  do  not  accept  this  approximation 
consist  of  structural  objects  like  ducts  and  inlets,  corner  reflectors,  edges,  and  smooth  surfaces  whose 
specular  return  varies  with  aspect.  In  the  presence  of  these  complex  scattering  centers,  an  ISAR 
image  generated  by  using  Relation  25  will  display  “artifacts”  characterized  by  streaks  and  curves, 
which  often  extend  beyond  the  physical  support  of  the  target.  Current  research  efforts  carefully  an¬ 
alyze  these  artifacts  so  that  the  type  and  actual  location  of  the  responsible  scattering  center  might 
be  estimated. 

It  is  apparent  that  the  central  problem  in  modern  ISAR  imaging  is  one  of  obtaining  enough 
resolution  to  separate  the  simple  point  scatterers  from  their  more  complex  neighbors.  In  addition, 
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increased  resolution  is  required  for  accurate  imaging  of  the  streaks  and  curves  associated  with  the 
complex  scatterers  which,  in  turn,  is  necessary  for  follow-on  image  interpretation.  In  this  sense, 
the  ISAR  imaging  problem  is  quite  similar  to  the  image  expansion  problem  already  discussed:  the 
ejffects  of  noise  must  be  moderated;  image  point  structure  must  be  sharpened;  and  edges  should 
not  be  blurred  in  the  process.  The  central  difference  between  the  ISAR  problem  and  optical  image 
analysis  is  the  complex  values  of  the  ISAR  image  and  the  (inherent)  complex  processing  that  must 
be  performed  for  effective  reconstruction. 

For  illustration  purposes  we  will  employ  a  two-dimensional  second-order  “measure  of  smooth¬ 
ness”  that,  in  terms  of  the  image  coordinates,  becomes 


ye(f)  dc  =  ftdeWdtf  =  _  (26) 


with  the  coefEcient  vector  defined  by 

=  ai/(yr,s+l)  +  02/(yr,s)  +  ffl3/(yr,s-l) 

^r,s,2^  —  f’l/(yr-l,s+l)  +  i’2/(yr,s)  +  ^’3/(yr+l,s-l) 

,  (27) 

‘ir,s,3f  =  Ol/(yr-l,s)  +  a2/(yr.s)  +  a3/(yr+l,s) 
dj,s.4f  =  ^l/(yr-l,s-l)  +  &2/(yr,s)  +  ^s/Cyr+l.s+l), 


(where  yr,s  are  image  array  coordinates).  The  choices  ai  =  as  =  1,  a2  =  -2,  bi  =  =  l/\/2,  and 

62  =  —2/y/2  yield  the  finite-difference  approximation  to  the  second  derivative  of  /  (as  in  Equation 
21).  Here,  however,  we  are  applying  a  slightly  more  general  form  to  allow  for  complex  values  of 
these  coefficients. 

Figure  2  is  an  example  of  this  method  applied  to  synthetic  ISAR  data.  The  original  Mi  x  M2 
=  24 X  24  scene  is  displayed  in  2a.  An  iVixiV2  =  16x16  data  array  was  generated  by  under-sampling 
the  array  Kf  generated  from  the  original  scene.  This,  of  course,  results  in  reconstruction  artifacts, 
which  are  displayed  in  2b.  (Figure  2b  was  generated  by  (K^K)”^K'^F  with  F  the  noise-free  under- 
sampled  data.)  Then  a  noisy  undersampled  data  set  was  constructed  using  a  =  | -y/jpFjp/iVT]^, 
The  corresponding  “traditional”  reconstructed  image  (as  in  2b)  is  shown  in  Figure  2c.  Figure  2d  is 
the  variable  weight  reconstruction  with  A  =  100,  ^^;big  =  10,  and  ==  10”^. 
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FIGURE  2.  Example  ISAR  Image  Reconstruction,  (a)  Original  Mi  x  M2  =  24  X  24  seen 
Simple  (Fourier)  image  reconstructed  from  an  undersampled  (iVi  x  A2  =  16  X  16)  data 
without  noise  (to  explicitly  show  undersampling  artifacts);  (c)  Image  reconstructed  from 

undersampled  data  (with  cr  =  §  \/jjFp/iViiV2);  and  (d)  Variable  weight  reconstruction 
A  =  100,  lUbig  =  10,  and  =  10“^. 
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The  smoothing  constraint  that  we  applied  to  the  data  set  in  the  example  of  Figure  2  was  ad 
hoc  and  constructed  in  an  effort  to  reduce  the  effect  of  both  sidelobes  and  image  noise.  In  Equation 
27  we  set 


ai 


h 


02  =  —2, 


-1 

Ml 


) 


(28) 


This  has  the  effect  of  rotating  the  sidelobes  associated  with  an  image  element  by  7r/2  with  respect 
to  the  local  peak  of  the  element — and  at  the  same  time  does  not  “over  emphasize”  the  variations 
due  to  noise. 


DISCUSSION  AND  CONCLUSION 


Bayesian  (and  associated)  methods  that  assign  probability  density  to  image  “intensity”  can  be 
justified  in  many  circumstances.  When  the  data  are  complex- values,  however,  or  when  it  is  allowed 
to  take  on  negative  values,  these  kind  of  probability  assignments  may  not  be  useful  because  they 
discard  important  aspects  of  the  problem  (e.g.,  phase).  In  the  foregoing  discussion  we  have  addressed 
this  issue  by  associating  the  relevant  probabilities  to  an  image  ensemble  that  is  defined  by  the  noise 
process  and  the  a  priori  information  available.  This  method  is  generally  independent  of  data  “type” 
considerations  and,  in  particular,  is  not  associated  with  image  intensity. 

By  freeing  the  method  from  an  artificial  dependence  on  data  amplitude  we  are  able  to  build  a 
more  general  method  that  allows  us  to  compare  the  local  phase  values  of  a  reconstructed  (complex¬ 
valued)  image.  Our  approach  has  been  for  a  regularizational  functional  consisting  of  two  terms: 
the  first  of  which  models  the  effects  of  noise  and  measures  the  “energy”  difference  between  the 
data  and  the  reconstructed  image;  and  a  second  term  that  tracks  the  “information  difference 
between  the  reconstructed  image  and  a  prior  image  model.  This  model  can  be  used  to  enforce 
smoothness  constraints  and  we  have  given  several  (elementary)  examples  to  illustrate  how  that  can 
be  accomplished.  These  examples  are  simple  and  easy  to  implement,  but  the  smoothness  criteria 
may  not  be  optimal  for  ISAR  image  reconstruction.  Moreover,  our  implementation  of  the  functional 
“minimization”  required  to  determine  the  minimizing  (estimate)  image  was  a  simple  expedient  that 
involved  a  binary  variation  across  the  image  elements  and  can  almost  certainly  be  improved  upon. 

Our  current  efforts  are  concentrating  on:  determining  which  (phase  dependent)  functionals  are 
appropriate  to  more  general  ISAR  imaging  and  extending  the  optimization  search  algorithm  to  allow 
for  continuous  variation  in  W. 


14 


NAWCWPNS  TP  8319 


REFERENCES 


1.  K.  M.  Hanson.  “Bayesian  and  Related  Methods  in  Image  Reconstruction  From  Incomplete 
Data,”  in  Image  Recovery:  Theory  and  Practice,  ed.  by  Henry  Stark.  Orlando,  Fla.,  Academic 
Press,  1987. 

2.  A.  N.  Tikhonov  and  V.  Y.  Arsenin.  Solution  of  El-Posed  Problems.  New  York,  Wiley,  1977. 

3.  D.  Keren  and  M.  Werman.  “Variations  on  Regularization,”  in  Proceedings  of  the  10th  Inter¬ 
national  Conference  on  Pattern  Recognition,  16-21  June  1990,  Atlantic  City,  N.J.  Washington 
D.C.,  IEEE  Computer  Society  Press,  1990,  Pp.  93-8. 

4.  D.  M.  Titterington.  “The  Maximum  Entropy  Method  for  Data  Analysis,”  Nature,  Vol.  312, 
No.  22  (1984),  pp.  381-2. 

5.  V.  A.  Macaulay  and  B.  Buck.  “Linear  Inversion  by  the  Method  of  Maximum  Entropy,”  Inverse 
Problems,  Vol.  5,  No.  5  (1989),  pp.  859-74. 

6.  B.  R.  Frieden  and  A.  T.  Bajkova.  “Bayesian  Cross-Entropy  of  Complex  Images,”  Applied  Optics, 
Vol.  33,  No.  2  (1994),  pp.  219-26. 

7.  A.  P.  Sage  and  J.  L.  Melsa.  Estimation  Theorey  with  Applications  to  Communications  and 
Control.  Huntington,  Del.,  Robert  E.  Krieger,  1979. 

8.  I.  Csiszar.  “Why  Least  Squares  and  Maximum  Entropy?  An  Axiomatic  Approach  to  Inference 
for  Linear  Inverse  Problems,”  Annals  of  Stat.,  Vol.  19,  No.  4  (1991),  pp.  2032-66. 

9.  J.  E.  Shore  and  R.  W.  Johnson.  “Axiomatic  Derivation  of  Maximum  Entropy  and  the  Principle 
of  Minimum  Cross-Entropy,”  IEEE  Trans.  Inf  Theory,  Vol.  IT-26,  No.  1  (1980),  pp.  26-37. 

10.  T.J.  Cornwell.  “Is  Jaynes’  Maximum  Entropy  Principle  Applicable  to  Image  Construction?” 
in  Indirect  Imaging:  Measurement  and  Processing  for  Indirect  Imaging,  ed.  by  J.  A.  Roberts. 
Cambridge,  Mass.,  Cambridge  University  Press,  1984. 

11.  A.  Mohammad-Djafari  and  G.  Demoment.  “Estimating  Priors  in  Maximum  Entropy  Image 
Processing,”  in  International  Conference  on  Accoustics,  Speech,  and  Signal  Processing,  3-6 
April  1990,  Albuquerque,  N.Mex.  Piscataway,  N.J.,  IEEE  Service  Center,  1990,  Pp.  2069-72. 

12.  D.  Lee  and  T.  Pavladis.  “One- Dimensional  Regularization  with  Discontinuities,”  IEEE  Trans, 
on  Pattern  Analysis  and  Machine  Intelligence,  Vol.  10,  No.  6  (1988),  pp.  822-9. 

13.  D.  Terzopoulos.  “Regularization  of  Inverse  Visual  Problems  Involving  Discontinuities,”  IEEE 
Trans,  on  Pattern  Analysis  and  Machine  Intelligence,  Vol.  8,  No.  4  (1986),  pp.  413-24. 

14.  S.  S.  Sinha  and  B.  G.  Schunck.  “Discontinuity  Preserving  Surface  Reconstruction,”  in  Proceed¬ 
ings  of  the  IEEE  Conference  on  Computer  Vision  and  Pattern  Recognition,  4-8  June  1989,  San 
Diego,  Calif.  Piscataway,  N.J.,  IEEE  Service  Center,  1989,  Pp.  229-34. 


15 


NAWCWPNS  TP  8319 


15.  H.  C.  Andrews  and  B.  R.  Hunt.  Digital  Image  Restoration.  Englewood  Cliffs,  N.J.,  Prentice- 
Hall,  1977. 

16.  R.  L.  Stevenson,  B.  E.  Schmitz,  and  E.  J.  Delp.  “Discontinuity  Preserving  Regularization  of 
Inverse  Visual  Problems,”  IEEE  Trans.  Systems^  Man,  and  Cybernetics,  Vol.  24,  No.  3  (1994), 
pp.  455-69. 

17.  R.  R.  Schultz  and  R.  L.  Stevenson,  “A  Bayesian  Approach  to  Image  Expansion  for  Improved 
Definition,”  IEEE  Trans.  Image  Proc.,  Vol.  3,  No.  3  (1994),  pp.  233-42. 

18.  D.L.  Mensa.  High-Resolution  Radar  Imaging.  Dedham,  Mass.,  Artech  House,  1981. 

19.  A.  W.  Rihaczek  and  S.  J.  Hershkowitz.  Radar  Resolution  and  Complex-Image  Analysis.  Ded¬ 
ham,  Mass.,  Artech  House,  1996. 


16 


INITIAL  DISTRIBUTION 


4  Naval  Air  Systems  Command,  Arlington 
AIR-4.1.1  (1) 

AIR-4,1.8 
File  (1) 

Stegman  (1) 

AIR-4.10A  (1) 

2  Defense  Technical  Information  Center,  Fort  Belvoir 
2  Center  for  Naval  Analyses,  Alexandria,  VA 
Dr.  A.  Borden  (1) 

L.  Lynn  (1) 


ON-SITE  DISTRIBUTION 


1  Code  455560D,  W.  Katzenstein 

1  Code  4555 SOD,  D.  Reade 

2  Code  4713C0D 

G.  Hewer  (1) 

R.  Smith  (1) 

1  Code  472230D,  D.  Paolino 
1  Code  472G70D,  G.  Winkler 

1  Code  473A20D,  M.  Mumford 

2  Code  474400D 

B.  Borden  (1) 

S.  Chesnut  (1) 

1  Code  474T60D,  T.  Loftus 

4Code  4BL000D  (3  plus  Archives  copy) 


